BEP-14: GPU Numpy

Abstract:
	To implement GPU operations in a transparent way, an idea would be to have a package that
	mimicks all Numpy functionality with a new data type, gpuarray (or possibly a new dtype).
	There seems to be a similar attempt named GpuPy that has not been publicly released.
	(http://www.tricity.wsu.edu/~bobl/personal/mypubs/2009_gpupy_toms.pdf).

Examples
========
from gpunumpy import *
x=zeros(100,dtype='gpufloat') # Creates an array of 100 elements on the GPU
y=ones(100,dtype='gpufloat')
z=exp(2*x+y) # z in on the GPU, all operations on CPU with no transfer
z_cpu=array(z,dtype='float') # z is copied to the CPU
i=(z>2.3).nonzero()[0] # operation on GPU, returns a CPU integer array

Existing packages
=================
* PyCuda: http://mathema.tician.de/software/pycuda
and PyOpenCL: http://pypi.python.org/pypi/pyopencl/
There is a GPUArray which does what we want, but there is probably no
lazy evaluation (so it is probably slow).

* GpuPy: http://www.tricity.wsu.edu/~bobl/personal/mypubs/2009_gpupy_toms.pdf
It is based on OpenGL/CG and uses lazy evaluation. It is very close to what we
want. Unfortunately there is no public release, and
the first author has no email address, the second author does not reply.

* Cuda-ndarray: http://code.google.com/p/cuda-ndarray/
It uses CUDA and is based on Theano, which does lazy evaluation.
It is currently undocumented and there is no Windows build, but the
project looks cool.

* Gpulib: http://gpulib.txcorp.com/
does not seem to maintain the Python interface.

* Matlab toolbox: http://www.accelereyes.com/

Low-level implementation
========================
CUDA / OpenCL: OpenCL probably has more future than CUDA and it is more
universal.

Functionality
=============
* element-wise operations on vectors (arithmetical, exp/log, exponentiation)
* same but on views (x[2:7])
* assignment (x[:]=2*y)
* boolean operations on vectors (x>2.5) and the nonzero() method
* multiplying a N*M matrix by an M*M matrix, where N is large and M is
small (but this could be done with vector operations).
* random number generation (gpurand(N))
The syntax must be the same as with normal arrays.
